Improving the readability of class lecture ASR results using a confusion network
نویسندگان
چکیده
This paper presents a method for improving the readability of Automatic Speech Recognition (ASR) results for classroom lectures. Most of the previous research on improving the readability of recognition results focused mainly on manually transcribed texts, and not ASR results. Due to the presence of a large number of domain-dependent words and the casual presentation style, even state-of-the-art recognizers yield a 30-50% word error rate for speech in classroom lectures. Thus, a method for improving the readability of ASR results needs to be robust against recognition errors. In this paper, we propose a novel method for improving the readability based on a machine translation model that uses a confusion network representing multiple hypotheses of the ASR results to achieve robustness against recognition errors. Experimental results show that the proposed method outperforms the baselines in both automatic and manual evaluations.
منابع مشابه
Improving the Readability of ASR Results for Lectures Using Multiple Hypotheses and Sentence-Level Knowledge
This paper presents a novel method for improving the readability of automatic speech recognition (ASR) results for classroom lectures. Because speech in a classroom is spontaneous and contains many ill-formed utterances with various disfluencies, the ASR result should be edited to improve the readability before presenting it to users, by applying some operations such as removing disfluencies, d...
متن کاملImproving the Readability of Class Lecture Automatic Speech Recognition Results Using Multiple Hypotheses
This paper presents a method for improving the readability of class lecture Automatic Speech Recognition (ASR) results, which hitherto have been difficult for humans to understand, even in the absence of recognition errors. This is because the speech in a class lecture is relatively casual and contains many ill-formed utterances with filled pauses, restarts, and so on. Recently there has been e...
متن کاملMultiple-Pronunciation Lexical Modeling Based on Phoneme Confusion Matrix for Dysarthric Speech Recognition
In this paper, we propose speaker-dependent multiple-pronunciation lexical modeling for improving the performance of dysarthric automatic speech recognition (ASR). For each dysarthric speaker, a phoneme confusion matrix is first constructed from the results of phoneme recognition. Then, pronunciation variation rules are extracted by investigating the phoneme confusion matrix, and they are incor...
متن کاملPseudo-morpheme and Confusion Network Based Korean-english Statistical Spoken Language Translation System
In this demonstration, we present POSSLT (POSTECH Spoken Language Translation) for a Korean-English statistical spoken language translation (SLT) system using pseudo-morpheme and confusion network (CN) based technique. Like most other SLT systems, automatic speech recognition (ASR) and machine translation (MT) are coupled in a cascading manner in our SLT system. We used confusion network based ...
متن کاملComputer Assisted Speech Transcription System for Efficient Speech Archive
This paper addresses computer assisted speech transcription (CAST) system for making archives such as meeting minutes and lecture notes. For such system, automatic speech recognition (ASR) is promising, but ASR errors are inevitable. Therefore, it is significant to design a good interface with which we can correct errors easily. Moreover, to make a better system, we should know what kind of rec...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010